DS002R Projects Overview

December 9, 2024

Maren Rusk

NYT Headlines Analysis

Process

  • Data on New York Times headlines from RTextTools package
    • Date, title, subject of headlines from 1996-2006
  • Compared total NYT headlines mentions between Bill Clinton and George W. Bush during their consecutive terms (Clinton 1997-2000 and Bush 2001-2004)
  • Examined word length of headlines across months, with and without punctuation

Presidential Headlines

This col plot shows the differences in total NYT headline mentions between George W. Bush and Bill Clinton during their consecutive presidential terms (Clinton 1997-2000 and Bush 2001-2004). The x-axis represents the two presidents, while the y-axis represents headline count. The different years are represented using fill color of the bars, with four segments for each president, representing four years in office. Overall, the column for Bush is much taller than that for Clinton, at around 65 headlines to Clinton's 20. The year with the largest number of presidential headlines is 2001.

Takeaways

  • Bush had far more headlines than Clinton (65-20)
    • 1170 total headlines from Clinton’s term, 1154 from Bush’s
  • No noticeable pattern of an increase or decrease in headlines between beginning and end of term
  • Bush had a large surge of headlines in 2001 (9/11)

Headline Length and Punctuation

This plot shows the distributions of the number of words per headline for each month for headlines with and without punctuation in a series of boxplots. The x-axis represents month, while the y-axis represents the number of words per headline. For each month, there is a red boxplot labeled 'FALSE' for headlines with no punctuations, and a blue boxplot labeled 'TRUE' for headlines with punctuation. By month, the distributions do not show a clear trend, but the ranges for February and November with punctuation appear to be the largest overall excluding outliers, while May has both the longest headline (largest outlier) and one of the shortest, at one word. The boxplots for headlines with punctuations show mostly higher median word counts than those without punctuation.

Takeaways

  • Headlines with punctuation are wordier
    • Median punctuation >= no punctuation in every month and upper quartile higher in every month but August
  • Headlines with punctuation have more outliers
  • February and November have the largest range of headline length, excluding outliers
  • May has both the shortest headline (1 word) and the longest (24 words)

Gender Stereotypes

Data

  • Experimental study on 5-7 year old children, focusing on smart, adult trials
   subject gender age trait target stereotype high_achieve_caution
1        1   male   6 smart adults       0.75                 0.25
2        2   male   5 smart adults       1.00                 1.00
3        3   male   7 smart adults       0.25                 0.25
4        4 female   5 smart adults       1.00                 1.00
5        5   male   5 smart adults       0.25                 0.75
6        6   male   7 smart adults       0.75                 0.50
7        7 female   6 smart adults       0.00                 0.50
8        8 female   6 smart adults       0.50                 0.75
9        9   male   5 smart adults       0.75                 0.50
10      10   male   5 smart adults       1.00                 0.50

Research question: Are young females less likely than young males to perceive adults of their own gender as smart?

Ho: μmale - μfemale = 0, Ha: μmale - μfemale > 0

  • Difference in mean stereotype proportion between male and female children is 0.0972.

P-Value: 0.0302

This histogram displays a null sampling distribution of differences between stereotype proportion by gender using randomized data. The x-axis represents the mean difference in proportion of stereotype by gender, and the y-axis represents count. The shape of the histogram resembles a normal curve, with a slight gap at x = 0. A red vertical line intercepts the x-axis at x = 0.0972, the observed mean difference from the actual experimental data.

Reject Null Hypothesis: This dataset presents statistically significant evidence that 5-7 year old males are more likely than 5-7 year old females to perceive adults of their own gender as “smart”.

Ear Absorbance by Age

This plot displays mean percent absorbance per frequency (in hertz) for the the 2023 study by Sun et al, by age group. The x-axis represents frequency, the y-axis represents mean absorbance, and each age group (grouped by 10 years) is displayed on a separate color-coded line. For the most part, the lines increase up to about 1000 hz, remain constant between 1000 and 4000 hz, and then decrease. The 80-89 age group shows a large dip between 1000 and 4000 hz in comparison to the other lines, with a local minimum at around 2000 hz, but still reaches a high peak at around 4000 hz.

Takeaways

  • Most age groups have very similar absorbance trajectories
  • 80-89 has a very different curve, with a large drop in mean absorbance at around 2000 hz, however has the highest mean absorbance from 400-800 and 4000-5000 hz
  • 70-79 curve shows no sign of absorbance decline

Thank You!